深处神经网络(例如Deep-FSMN)已被广泛研究以用于关键字发现(KWS)应用。但是,这些网络的计算资源通常受到重大限制,因为它们通常在边缘设备上在通话中运行。在本文中,我们提出了BIFSMN,这是KWS的准确且极高的二元神经网络。我们首先为二进制化训练构建了高频增强蒸馏方案,该方案强调了全优先网络表示的高频信息,这对于对二进制网络的优化更为重要。然后,为了在运行时允许即时和自适应的准确性效率折衷,我们还提出了一个可稀薄的二进制架构,以从拓扑角度进一步解放二进制网络的加速潜力。此外,我们在ARMV8设备上为BIFSMN实施了快速的位计算内核,该内核充分利用了寄存器并增加了指令吞吐量以突破部署效率的极限。广泛的实验表明,BIFSMN通过说服各种数据集的利润率优于现有的二进制方法,甚至与全精度对应物相当(例如,语音命令v1-12下降少于3%)。我们强调的是,BIFSMN受益于稀薄的体系结构和优化的1位实现,可以在现实世界中的Edge硬件上实现令人印象深刻的22.3倍加速和15.5倍的存储空间。
translated by 谷歌翻译
量化是促进硬件友好的深度学习和在资源限制硬件上运行深层神经网络的有效方法。然而,它仍然对网络的准确性显着减少。我们总结了量化分为两类的挑战:对复杂场景的不同架构和量化的量化。我们的研究主要集中在各种架构和场景上应用量化,并推动量化极限,以极度压缩和加速网络。对量化的综合研究将实现更强大,更高效,更灵活的硬件友好的深度学习,并使其更适合更真实的应用。
translated by 谷歌翻译
模型二进制化是一种压缩神经网络并加速其推理过程的有效方法。但是,1位模型和32位模型之间仍然存在显着的性能差距。实证研究表明,二进制会导致前进和向后传播中的信息损失。我们提出了一个新颖的分布敏感信息保留网络(DIR-NET),该网络通过改善内部传播和引入外部表示,将信息保留在前后传播中。 DIR-NET主要取决于三个技术贡献:(1)最大化二进制(IMB)的信息:最小化信息损失和通过重量平衡和标准化同时同时使用权重/激活的二进制误差; (2)分布敏感的两阶段估计器(DTE):通过共同考虑更新能力和准确的梯度来通过分配敏感的软近似来保留梯度的信息; (3)代表性二进制 - 意识蒸馏(RBD):通过提炼完整精确和二元化网络之间的表示来保留表示信息。 DIR-NET从统一信息的角度研究了BNN的前进过程和后退过程,从而提供了对网络二进制机制的新见解。我们的DIR-NET中的三种技术具有多功能性和有效性,可以在各种结构中应用以改善BNN。关于图像分类和客观检测任务的综合实验表明,我们的DIR-NET始终优于主流和紧凑型体系结构(例如Resnet,vgg,vgg,EfficityNet,darts和mobilenet)下最新的二进制方法。此外,我们在现实世界中的资源有限设备上执行DIR-NET,该设备可实现11.1倍的存储空间和5.4倍的速度。
translated by 谷歌翻译
最近,生成的数据无量子化作为一种​​实用的方法,将神经网络压缩到低位宽度而不访问真实数据。它通过利用其全精密对应物的批量归一化(BN)统计来生成数据来量化网络。然而,我们的研究表明,在实践中,BN统计的合成数据在分布和样品水平时严重均匀化,这导致量化网络的严重劣化。本文提出了一种通用不同的样本生成(DSG)方案,用于生成无数据的训练后量化和量化感知培训,以减轻有害的均质化。在我们的DSG中,我们首先将统计对齐缩写为BN层中的功能,以放宽分配约束。然后,我们加强特定BN层对不同样品的损失影响,并抑制了生成过程中样品之间的相关性,分别从统计和空间角度分别多样化样本。广泛的实验表明,对于大规模的图像分类任务,我们的DSG可以始终如一地优于各种神经结构上的现有数据无数据量化方法,尤其是在超低比特宽度下(例如,在W4A4设置下的22%的增益下)。此外,由我们的DSG引起的数据多样化引起了各种量化方法的一般增益,证明了多样性是无数据量化的高质量合成数据的重要特性。
translated by 谷歌翻译
量化已成为压缩和加速神经网络最普遍的方法之一。最近,无数据量化已被广泛研究作为实用和有前途的解决方案。它根据FP32批量归一化(BN)统计,合成校准量化模型的数据,并显着降低了传统量化方法中实际训练数据的沉重依赖性。不幸的是,我们发现在实践中,BN统计的合成数据在分配水平和样品水平上具有严重均匀化,并且进一步引起量化模型的显着性能下降。我们提出了各种样品生成(DSG)方案,以减轻均质化引起的不利影响。具体而言,我们松弛BN层中的特征统计的对准,以在分配水平处放宽约束,并设计一个层状增强,以加强针对不同的数据样本的特定层。我们的DSG方案是多功能的,甚至能够应用于现代训练后的训练后的量化方法,如亚马逊。我们评估大规模图像分类任务的DSG方案,并始终如一地获得各种网络架构和量化方法的显着改进,特别是当量化到较低位时(例如,在W4A4上的高达22%)。此外,从增强的多样性受益,综合数据校准的模型均接近通过实际数据校准的那些,甚至在W4A4上越优于它们。
translated by 谷歌翻译
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
Smart City applications, such as traffic monitoring and disaster response, often use swarms of intelligent and cooperative drones to efficiently collect sensor data over different areas of interest and time spans. However, when the required sensing becomes spatio-temporally large and varying, a collective arrangement of sensing tasks to a large number of battery-restricted and distributed drones is challenging. To address this problem, we introduce a scalable and energy-aware model for planning and coordination of spatio-temporal sensing. The coordination model is built upon a decentralized multi-agent collective learning algorithm (EPOS) to ensure scalability, resilience, and flexibility that existing approaches lack of. Experimental results illustrate the outstanding performance of the proposed method compared to state-of-the-art methods. Analytical results contribute a deeper understanding of how coordinated mobility of drones influences sensing performance. This novel coordination solution is applied to traffic monitoring using real-world data to demonstrate a $46.45\%$ more accurate and $2.88\%$ more efficient detection of vehicles as the number of drones become a scarce resource.
translated by 谷歌翻译
Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs such as the position of a document. A successful factorization will allow the relevance tower to be exempt from biases. In this work, we identify a critical issue that existing ULTR methods ignored - the bias tower can be confounded with the relevance tower via the underlying true relevance. In particular, the positions were determined by the logging policy, i.e., the previous production model, which would possess relevance information. We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation. We then propose three methods to mitigate the negative confounding effects by better disentangling relevance and bias. Empirical results on both controlled public datasets and a large-scale industry dataset show the effectiveness of the proposed approaches.
translated by 谷歌翻译
Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep learning into SPAD, enabling super-resolution single-photon imaging over an order of magnitude, with significant enhancement of bit depth and imaging quality. We first studied the complex photon flow model of SPAD electronics to accurately characterize multiple physical noise sources, and collected a real SPAD image dataset (64 $\times$ 32 pixels, 90 scenes, 10 different bit depth, 3 different illumination flux, 2790 images in total) to calibrate noise model parameters. With this real-world physical noise model, we for the first time synthesized a large-scale realistic single-photon image dataset (image pairs of 5 different resolutions with maximum megapixels, 17250 scenes, 10 different bit depth, 3 different illumination flux, 2.6 million images in total) for subsequent network training. To tackle the severe super-resolution challenge of SPAD inputs with low bit depth, low resolution, and heavy noise, we further built a deep transformer network with a content-adaptive self-attention mechanism and gated fusion modules, which can dig global contextual features to remove multi-source noise and extract full-frequency details. We applied the technique on a series of experiments including macroscopic and microscopic imaging, microfluidic inspection, and Fourier ptychography. The experiments validate the technique's state-of-the-art super-resolution SPAD imaging performance, with more than 5 dB superiority on PSNR compared to the existing methods.
translated by 谷歌翻译
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent years. This achievement can be ascribed in part to advances in AI subfields including Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). Deep learning, a sub-field of machine learning that employs artificial neural network concepts, has enabled the most rapid growth in these domains. The integration of vision and language has sparked a lot of attention as a result of this. The tasks have been created in such a way that they properly exemplify the concepts of deep learning. In this review paper, we provide a thorough and an extensive review of the state of the arts approaches, key models design principles and discuss existing datasets, methods, their problem formulation and evaluation measures for VQA and Visual reasoning tasks to understand vision and language representation learning. We also present some potential future paths in this field of research, with the hope that our study may generate new ideas and novel approaches to handle existing difficulties and develop new applications.
translated by 谷歌翻译